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Abstract 

The approximation of a general d-variate function / by the shifts — 
^ S H C M*^, of a fixed function cf) occurs in many appUcations such as data fit- 
ting, neural networks, and learning theory. When H = hZ'^ is a dilate of the integer 
lattice, there is a rather complete understanding of the approximation problem 
[6, 18] using Fourier techniques. However, in most applications the center set H 
is either given, or can be chosen with complete freedom. In both of these cases, 
the shift-invariant setting is too restrictive. This paper studies the approximation 
problem in the case H is arbitrary. It establishes approximation theorems whose 
error bounds reflect the local density of the points in H. Two different settings are 
analyzed. The first is when the set S is prescribed in advance. In this case, the 
theorems of this paper show that, in analogy with the classical univariate spline ap- 
proximation, improved approximation occurs in regions where the density is high. 
The second setting corresponds to the problem of non-linear approximation. In that 
setting the set H can be chosen using information about the target function /. We 
discuss how to 'best' make these choices and give estimates for the approximation 
error. 



AMS subject classification: 42C40, 46B70, 26B35, 42B25 

Key Words: image/signal processing, computation, nonlinear approximation, op- 
timal approximation, radial basis functions, scattered data, thin-plate splines, sur- 
face splines, approximation order 

*This work has been supported by the Office of Naval Research Contracts ONR-NOOO 14-03- 1-0051, 
ONR/DEPSCoR N00014-03-1-0675, ONR/DEPSCoR NG0G14-05-1-G715; the Army Research Office Con- 
tracts DAAD 19-02-1-0028, W911NF-05-1-0227, and W911NF-07-1-0185; the National Institute of Gen- 
eral Medical Sciences under Grant NIH-1-R01-GM072000-01; the National Science Foundation under 
Grants DMS-0221642, DMS-9872890, DMS-354707, DBI-9983114, ANF0085984 and DMS-0602837 



1 



1 Introduction 



The mathematical problem of data fitting in the d-variate Euclidean space M has vast 
applications in science and engineering. Many algorithms address this problem by ap- 
proximating the data by a linear combination F = ^^g^ c(^)(/)(- — with S C R'^, and (f) 
a carefully-selected, often radial, function defined on R'^. One of the primary motivations 
for this approach is that if the data themselves are defined on S and is chosen properly, 
then there is a unique function related to the above that interpolates the data [23, 29, 30]. 
For example, if is the so-called surface spline (a fundamental solution of the m-fold 
iterated Laplacian) and S is a given finite set of points in R*^, then for given data (^, y^), 
^ e S, there is a unique interpolant to the data from the span Ss{(f>) of the 0(- — ^), 
^ e E} We refer to the book [32] for more details on radial basis functions in general, 
and their use in interpolation, in particular. 

The problem of estimating the interpolation error in the above setting was studied 
extensively in the hterature. We refer the reader to [21, 33], where the interpolated 
function / is assumed to come from the so-called "native space" (this approach originated 
in the work of Duchon, [13, 14]) and to the more recent [28, 35, 36, 24], where error 
estimates are established for general smooth functions in Sobolev spaces. It should be 
noted that the interpolation problem is usually analyzed for functions defined on bounded 
subdomains of R'', and it is well-understood that the interpolation error for this setup 
suffers significantly from the so-called "boundary effect" . Typically, the rates of decay of 
the error for smooth functions are about half the corresponding decay rates that are valid 
in the boundary-free shift-invariant case, [19, 17]. 

Interpolation is not necessarily the best approach to the data fitting problem for 
various reasons including possible noise in the data, the computational overhead, and 
possible lack of stability in the algorithms. If the data are given by a function / defined 
on R'^, i.e. = /(^) (or y^ ^ f{^) in the noisy version of the problem), the primary 
question is how well can / be approximated in a given metric (typically L^-norms) from 
the given information. This is governed, and in part determined, by the related question 
of how well / can be approximated from the span of the translates 0(- — ^), ^ G S, in the 
given metric. 

In this paper, we shall be solely concerned with the latter approximation problem. We 
start with a countable set S of points in R'^ and define Ss{(f)) to be the set of all functions 
which are finite hnear combinations of the shifts 0(- — C); C ^ We are interested in how 
well a given function / e Lp{W^) can be approximated (in the Lp-norm) by the elements 
of >S'h(0) (more precisely by elements in the closure of this space in the given metric). 

-^The interpolant is actually selected from a space of the form S^{(j))(BP, with S^{(j)) a certain subspace 
of Ss{(p), and P a finite dimensional space of polynomials, that depends only on </>. 
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Such approximation problems have been well studied, especially in the case that S is a 
dilate of the integer lattice (S = hZ'^ with > 0), [7, 3, 8, 6, 27, 18]. The case where 
the centers S are scattered was studied in [1, 15, 2, 20, 34]. The error bounds in all 
these references are given in terms of a global mesh density parameter. In contrast, error 
bounds that depend on the local density of the scattered centers (i.e., provide improved 
error bounds on subdomains that contain dense clusters of centers) are less studied and 
less understood, even though it is often the natural setting in applications. The most 
notable exception, is, of course, spline approximation in one variable. The fact that 
the error bounds in linear approximation by splines reflects the local mesh ratio, [5], is 
a key property of spline approximation. Furthermore, the development and analysis of 
non-linear approximation schemes for univariate splines [25, 10, 11] presented the first 
challenge for the development of the substantial theory of non-linear approximation. In 
more than one variable, however, far less is known. We refer to [26], where low-rate 
strongly local error estimates are established, and to the approximation scheme based on 
the "power function approach" in [33]. 

We shall consider two types of problems for scattered center approximation. In the 
first, we assume that the set S is fixed and we derive results that show improved approx- 
imation in regions where the density is high. These results are described in §3 and §5. 
The second setting that we consider allows the centers to be chosen dependent on the 
function /. The basic goal is to establish error bounds that depend on the cardinality N 
of the chosen center set S. This is a form of nonlinear approximation known as A'^-term 
approximation which has been well studied in other settings, primarily for wavelet bases. 
Our result here is similar to the results on nonlinear wavelet approximation. We show 
that a function can be approximated in Lp{M!^) with error 0{N~^/'^) once it hes in the 
Triebel-Lizorkin space F^^^iW^)) where s, p, and r are related (as in the Sobolev embed- 
ding theorem) by ;^ — ^ = ^ and q — (1 -|- ^)~^. Prom this result and standard embeddings 
for Triebel-Lizorkin spaces, we derive corresponding theorems for A^-term approximation 
in terms of the Besov classes. While our actual results in this direction are close in nature 
to the wavelet results, the non-linear approximation algorithm that leads to the above 
error bounds differs from its wavelet counterpart: the thresholding algorithm that is em- 
ployed in the wavelet case is sub-optimal in the present such, we introduce and 
analyse a more sophisticated algorithm. Details of this result are given in §6. 

We begin in the following section by describing the assumptions we make about and 
S. We then give our first results for the linear approximation problem in §3. In §4 we 
recall some results about wavelet decompositions and the use of such decompositions in 
the characterization of smoothness spaces (Triebel-Lizorkin spaces; they include the more 
standard Sobolev spaces). We stress that our paper is not concerned with wavelets: we 
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merely use wavelets as a tool for defining our approximation schemes. In §5, we complete 
our study of the linear approximation problem. Finally, we prove in §6 our results on 
nonlinear approximation. 

We shall treat approximation on all of M*^. In most applications, one would be inter- 
ested in the case of approximation on a compact domain Q of M.'^. Results on domains 
can be derived easily from our results (if the approximand / is defined on a domain that 
includes Q in its interior, and if one agrees to allow centers outside fl) but we do not 
pursue this here. 

2 The setting 

We describe in this section the setting that will be analyzed in the first part of the paper. 
There are two main ingredients in our setting. The first is the set S of centers which can 
be allowed. Wc do not make direct assumptions on the geometry of the set H: almost any 
set S will do. Once the set S is given, we associate it with a density function 

The value h{t) of the density function depends strongly on the local density of S around 
t: roughly speaking, there should be L centers of 5 is a ball of radius L'h{t) centered at 
with L, L' dependent only on and some parameters that we choose for our approximation 
scheme (and that we fix throughout). The density h{t) depends also on the geometry of 
the centers around t. This dependence is generally mild; also, our assumptions never spell 
out this dependence explicitly. It is embedded implicitly in other assumptions. 

We assume that our set S of centers is a countable set in and 

Al: (i) Any finite hall in contains a finite number of points from H. 

(ii) For each integer n, there is an R = R{n) such that each hall of radius R 
contains at least n points from S. 

Property (i) prevents the occurrence of accumulation points in the set S while property (ii) 
prohibits the existence of arbitrarily large regions on which there are no centers. Neither 
of the two conditions in Al is essential, and the entire assumption is adopted merely in 
order to simplify the presentation and the analysis. 

The first essential ingredient in our setting is the identification of the class of basis 
functions (J) that our analysis applies to. We will make assumptions on (j) that arc conve- 
nient and make the ensuing analysis as transparent as possible and yet general enough to 
be valid for certain (but not all) types of multivariate splines and radial basis functions. 
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We assume that is a locally integrable function which, when viewed on all of IR'^, is 
a tempered distribution. We denote by 



the finite linear span of the functions 0(- — ^ e S. We put forward three assumptions: 
one on 0, one on S, and one that connects between and S. At the end of this section, 
we analyse these assumptions for specific choices of 0. 

Let :— C^{W^) denote the set of all functions that are infinitely differentiable with 
compact support (test functions) and Cq the analogous spaces of /c-times continuously 
differential functions with compact support (Cq := Cq). Our first assumption about is: 

A2: There is a positive integer k. > and a linear operator T mapping Cq into Cq, 
such that for all f G Cq, 



Note that the integration is well-defined: since (J) is locally in Li, and T(/) e Cq, we have 
that T{f)(j){x - •) £ Li, for every x G W^. 

The typical example of T is a homogeneous elliptic differential operator of order k 
with constant coefficients. In this case is its fundamental solution on W^. Note that 
is, in this case, a smooth function on M'^\0. It will be sometimes convenient to add this 
assumption to A2: 



On the other hand, assumption A2 may appear too strong for certain applications, since 
it excludes some interesting examples corresponding to fractional differentiation. Those 
who may wish to extend our theory in such directions may allow T(/) to have global 
support; it will still need to decay suitably at infinity. All in all, there is some fiexibility 
in the formulation of A2. 

That said, A2, either in its current variant or in some related one, is fundamental: it 
determines the maximal decay rate of the error that our approximation scheme can yield 
(this is the parameter k), and determines the space W {Lp{W^) , (f)) of smooth functions 
that can be approximated at this rate. Let us discuss now this latter issue. 

The function space W{Lp{W^), 0) corresponds to the Sobolev space W'^{Lp{W^)) in the 
case T is an elliptic differential operator of order k (with constant coefficients) . For more 
general T, the definition is more abstract: Fixing 1 < p < oo, we define the semi-norm 




(2.1) 



(f) is smooth on 



\0. 



|/|w^(Lp(Md),</)) ||r/||Lj,(]Rrf), / e C'c 



lOO 



(2.2) 
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and the norm: 

||/||w(Lp(]Rd),0) := ||/||Lj,(Rd) + |/|w(Lj,(Md),0). (2.3) 

Using the above, we define W {Lp{W'') , cj)) to be the completion of in this topology. 
Since the norm in (2.3) is stronger than the Lp-norm, W{Lp, 0) is precisely the space of all 
functions / e Lp{W'') for which there is a sequence (/„)n>i C such that /„ converges 
to / and {T{fn)) converges to a function g e -Lp(lR'^) both in the sense of Lp(R.'^). By 
making suitable assumptions on we can conclude that g depends on /, but not on the 
specific sequence (/„) . The operator T, which was initially defined only for test functions, 
now extends naturally to a linear operator on W{Lp, 0) by defining Tf :— g. 

We are guided in the above setup by the following example, [13, 14] : if T is the m-fold 
Laplacian, then k — 2m and the function is then the fundamental solution of T: 

4>^Cm{ , „ , , _ _ (2.4) 



|2m d^Qg I . 1^ (/ even. 

(Here, | ■ | stands for the Euclidean norm in W^.) The function is also called a surface 
spline. In this case, W{LpiW^),(l)) is simply the Sobolev space PF^™'(Lp(R°')) equipped 
with its usual semi-norm and norm. 

Our remaining assumption about concerns how well its translates can be approxi- 
mated from S's(0). Consider the translate 0(- — t) where i e R'^ is fixed for the moment. 
We look for a local approximation of the form 

K{;t)= J2 A{t,OH--0, (2.5) 

for suitable t-dcpcndcnt coefficients A(t,^) and a finite set S(t) C S. A key component in 
the success of our approach is the availability of kernels K{-, ■) that are local and bounded 
on the one hand, and approximate well the convolution kernel {x, t) ^ (j){x — t). Wc break 
this assumption into two: A3 deals with basic qualitative properties of the scheme that 
is used to define fC, viz., the coefficient functions A(-,^). The companion property, A4, 
deals with the way K approximates the convolution kernel. 

A3: There is an integer n' > and a real number Mq such that for any t the set 
E{t) consists of at most n' points all lying in the hall of radius Mq centered at t and the 
coefficients of the approximation kernel (2.5) satisfy ^(-,0 ^ L^^U!^) for all ^ G S. 

Similar to condition Al, Condition A3 is secondary, and is formulated and adopted in 
order to exclude pathological kernels K. This brings us to our last assumption. As said, 
the last assumption is concerned with the way K{-,t) approximates the translate 0(- — t). 
This assumption must be dealt with care: the error E{-,t) := 0(- — t) — K{-,t) should 
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reflect not only the basic properties of (f), but also the local distribution of the center set 
S around t. To this end, we deflne for each t eM.^ 

h{t) := inf {p : A{t, =0, \^ - t\ > p}. (2.6) 

In other words, for each t e W^, the only centers from S used in the approximation kernel 
K{-,t) lie in a ball Bt{h{t)) of radius h{t) centered at t. In our approach, h{t) measures 
the "effective" local density of the set S around the point t. We will discuss this issue in 
the sequel. Right now, let us complete our basic assumptions. We define the error kernel 

E{x,t) := (j){x -t) - K{x,t), x,teR'^. (2.7) 

Notice that for each t e W^, E is a, finite hnear combination of translates of cf) using centers 
from tUE. We shall assume that 

A4: There is a positive real number v > d and a constant C > depending only on (f) 
such that 

|£;(x,t)| <cM^)'^-'(l + ^^)-^ x,teR'', (2.8) 

where k is the integer in A2. 

As we will see in the examples that follow, the local density h{t) must be chosen to 
satisfy two properties: first, the ball Bt{h{t)) := {x G M'' : \x — t\ < h{t)} must contain 
a minimal number of centers from S. This number is determined by the parameter u. 
However, choosing h(t) at this minimal value leads to error kernel E{-,t) that arc too 
large, forcing us to select a large constant C. In such a case, it is usually preferable to 
increase h{t), so that more centers are captured in the enlarged ball Bi{h{t)). By playing 
this game correctly at all points t, we can control the global constant C. Needless to say, 
this comes at a price, since the density function h will enter our error bounds as well. 

The remainder of this section will discuss two examples where the assumptions Al-4 
are satisfied. These examples will provide a better understanding of the assumptions as 
well as of the nature of the smoothness spaces W and the density function h. 

Example 1: Univariate splines. 

We consider the truncated power 0(i) := t']^^ defined on M. We have the elementary and 
well-known representation 

oo 

f{x) = j f^^\t)<f>ix - t) dt, (2.9) 

— oo 

which holds for aU functions in Cg (M). This means that A2 holds for T := D'^/{k - 1)!. 
Now given t e R, let S(t) = {^i(i), . . . , ^^(t)} C S be the set of the k, points in S that are 
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closest to t. The divided difference 

[t, Ut),---, UtMix - •) =: a{t)<P{x -t)- J2 Mt, OH^ - (2.10) 

?GH{t) 

is tlie B-spline x i— >• M{x) associated witli tlie knots {t, ■ • • Tlius, we can 

take A{t,^) = Ao{t,^)/a{t) for ^ G and A{t,^) = otlierwise. It follows that 

h{t) = max \t — ^j{t)\. A well-known property of divided differences (see [12], p. 121) 

j = l,...,K 

gives 

i«(^)i = ni^-^^(^)r'^M^)"^ (2.11) 

i=i 

Since \E[x,t)\ = M{x)/\a{t)\ < h{t)'^M[x), assumption A4 follows from the facts that 
M{x) < l/h{t) and that M{x) vanishes for x ^ {t — h{t),t + h{t)). As to the constant 
C, it can be chosen as C := 4 (for u :— 2). The verification of A3 is straightforward (via 
Al). 

It is worth stressing the fact that our ability to provide tight error estimates for uni- 
variate splines is not only due to the banded structure of the error kernel E. It was also 
due to the fact that univariate spline theory tells us that M{x) < l/h{t): thus, while the 
actual coefficient A(-,^i) of the truncated power (■ — ^i)'^^ in the representation of M{x) 
can be arbitrarily large, the size of this coefficient does not affect ||M||^. Unfortunately, 
we are not aware of a multivariate counterpart of this result. ■ 

Example 2: Surface splines. 

The multivariate analog of the truncated power is the surface sphne (see (2.4); it is also 
known as the polyharmonic spline.) The best known surface spline is the bivariate thin- 
plate spline 

0=|-|'log|-|, 

which is, up to a constant, the fundamental solution for when d — 2. 

We have noted earher that property A2 holds for surface splines of any dimension. 
We will analyse in detail A4, and will briefly discuss A3. 

Let v > dhe the number that appears in A4, and deflne 

n :— K — d + u. 

Let P be the space of all polynomials of degree < n in d variables. Given a flnite set 
Z cM.^, we denote by Az the span of the functionals 

5,eP', zeZ, 5,{p):^p{z), p e P. 
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Every A := J2zez '^{^)^z £ extends to C(R'^)', with the norm of the extension being 

l|A||=E|a(^)|. 

Now, for every t e M*^, we select a finite subset S(t) C S that satisfies the following 
properties: 

(i) h{t) :— max{|i — ^| : ^ G is "as small as possible". 

(ii) There exists Aj e ^B.{t} such that Xt agrees with 5^ on P. 

(iii) 1 1 At 1 1 < C, for some ^-independent constant C that we choose in advance. 

Let us first remark that there are always sets satisfying (ii) and (iii).^ We have 
said nothing about the size of the constant C in (iii). We do not provide specific algorithms 
for choosing optimally C. The general rule of thumb is that by choosing S(t) to contain 
the r + dimP points in S that are closest to t, with r a small positive integer, we should 
be able to find (with high probability, for a generic distribution of centers) At that satisfies 
the above. 

Once we have chosen the functional A^ =: ^^g=(t) ^(^' O^J' "^^ define 

>l(i,O = 0, on S\S(i), 

and define the kernels 

K(; t) := Yl ^(*' - 0, E(; t) := 0(- -t)- K(; t). 

Next, recall that (up to a constant that depends only on d and k) = | • |'*~''L, with 
L — log I • I whenever k — d is an even integer, and L — 1 otherwise. 

We now complete the proof of property A4. Let S and t be given. While we must 
verify A4 for every t and every set S satisfying (i-iii), we can (by translating both t and 
S) assume without loss of generahty that t — 0. Let h :— h{t) be as in (i). Suppose first 
that \x\ > 2h. If is any polynomial of degree < n, then 

\(j){x)-K{x,0)\ = \(f){x)-Xo{(f){x--))\ < \(p{x)-R{x)\ + \Xo{<f){x--)-R{x~-))\. (2.12) 

sketch of the argument is as follows. First, the claim is definitely correct when S = Z''. Therefore, 
there exists S > 0, such that the claim is correct, provided that S has non-zero intersection with any 
ball Ba{6), a G U^. Since ||A|| is invariant under dilation, the claim is thus correct provided that S has 
non-zero intersection with any ball Ba(^i(l)), ct € ^^^Z"^, and with -R(l) as in Al. Thanks to assumption 
Al, our S satisfies, indeed, this last property. The argument as here is of mostly theoretical value, since 
it employs a localization process that involves only a small subset of S, and results therefore in a density 
function that is prohibitively large. 
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In particular, choosing R as the Taylor polynomial of degree n — 1 at x of 0, the first term 
on the right side of (2.12) is zero and we obtain 

\<l>{x)-K{x,0)\<CU-R\\,^^B) 

where C is the constant in (iii) and B is the ball of radius h about x. Prom the Taylor 
remainder formula we obtain 

- K{x, 0)1 < < CC'l^r-'^-'^/i'^ < CCh^-^il + ^]-^ (2.13) 

77/. fh 

where the constants C", C" depend on cf) and u but are independent of S, t, x. Here, we 
used the fact that and are comparable for x' e B because |a;| > 2h. 

If |a;| < 2h, then |a; — ^| < 3h for every ^ e supp Aq. Assume momentarily that h — 1. 
Then we simply estimate 

\<P{x) - K{x, 0)1 = \<P{x) - Xo{<P{x -•))!< (1 + llAoll) Uh^^B) < CC", (2.14) 

where now B is the ball of radius 3 about the origin, C" depends only on and n, and 
C is the constant in(iii). 

Now, suppose that h := h{0) 7^ 1. Then, dilating 0, S and Aq by h, we note that 

<l>(x/h) - Ao(0((x - ■)/h)) = h'-^icf^ix) - Ao(0(x - •))) + (q(x) - Xo(q{x - •))), 

with q a polynomial of degree < K — d (viz., g = for odd K, — d, and q = —\ - \ '^~'^h'^~'^logh 
otherwise). Since Sq — Aq annihilates all such polynomials, we conclude that 

<P{x/h) - Xomx - ■)/h)) = h''-'^{<t>{x) - Ao(0(a; - ■)))• (2.15) 

Invoking now the analysis of the {h — l)-case, we have that 

\4>{x/h)-\omx--)/h))\<CC'\ 

and hence, by (2.15), 

|0(x)-Ao((/)(x--))| <CC""/i«-^ 
provided that \x\ < 2h. Altogether, for x e M.'^, 

\<P{x) - K{x, 0)1 < CCih^-^'il + ^]-^ (2.16) 

with C as in (iii), and Ci a universal constant. 

For general i e M*^, an argument identical to the above leads to 

\(j){x-t) - K{x,t)\ < CCih{tf-'^{l + \x-t\/h{t))-r 
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This validates A4. It also shows the constant that appears in A4 is the product of a 
constant that is independent of E,x,t by the uniform bound for the norms ||At||, t e W^. 

Concerning property A3, for each fixed ^, the function A{-,C,) has compact support 
because of assumption Al and our remarks above about the choice of Since this 

function is also uniformly bounded as we have shown in the discussion of (iii), we see that 
A3 is also satisfied. 



3 Approximation with a prescribed set E of centers 

In this section, we assume that the set S of centers is fixed in advance. We work under 
the assumption that S, (f) satisfy A1-A4. We shall prove a theorem for the approximation 
of a given function / e W{Lp{M.^), 0) by the elements of 5's(0). In §5, we will extend the 
results of this section to more general functions in Lp(R^). 

Since our goal is to derive error estimates that reflect the local density of S, it may 
seem that we can employ our measure of density t i— > h(t) in such estimates. However, 
it turns out that h may change too rapidly to allow effective error analysis (unless one 
replaces A4 by the stronger assumption that E[x,t) is supported in the domain {{x,t) : 
Ix — t| < Ch{t)}. However, the only interesting example that satisfies this stronger 
condition is univariate splines). To circumvent this difficulty, we introduce a companion 
density function H that varies more slowly than the original h. 

Given S, 4> and a local density h that satisfy A1-A4, we define 



and with u as in assumption (2.8). The larger the value of r, the smaller the density 
function H and the better the estimates that we shall obtain.^ Notice also that in the 
examples in the previous section, the number i/ appearing in A4 can be chosen arbitrary 
large; however, the constant that appears in A4 and the density function h depend both 
on the selection of this p. 

^It is therefore natural to try to take r = in the analysis we give below; but this fails to work 
(barely) . One could introduce logarithmic factors in the definition of H and get slightly improved results 
but at the expense of notational complications that we want to avoid. 





where r is any fixed number satisfying 



V — d 



(3.2) 



< r < 
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We assume that / e W{Lp(R.'^), 0) and then derive an error estimate for approximating 
/ by the elements of Ss{4>)- We first want to enlarge the space 5's(0) to include certain 
infinite sums. Given 1 < p < oo, we define 

to be the closure of Ss{4>) in the topology of convergence in Lp{Q) for each compact 

n c M'^. 

To describe the approximation procedure we are going to use, we recall the kernel K 
(given by (2.5)) which describes how 0(- — t) is approximated. Also recall the error bound 
(2.8) for E 0(- — t) — K{-,t) which we assume to hold for this approximation. 

Given any positive weight function w defined on M.'^, we define the norm 

IblUpM := lk5'||Lj,(Md), 1 < p < oo, (3.3) 
and the approximation error 

S{f, ^=(0))l,w ^ inf 11/ - 1 < p < oo. (3.4) 

Notice that || • \\lp{w) differs from the more usual definition of weighted Lp- norms. 

Theorem 3.1 Suppose that 2,0 satisfy Al-4, and let 1 < p < oo. If f E W{Lp{R'^),4>) 
and w := , then we have 

S{f,S^{4))p)Lj,{w) < C'o|/|w(Lp(Md),,/>) (3.5) 

with Co = CC , C being dependent only on 4>, v and r, and C is the constant that appears 
in A4. 

The parameter k (which was introduced in Assumptions A2, A4) determines the rate of 
decay of the error. Note that k, appears on both sides of (3.5) (in the definition of Lp{w), 
as well as in the definition of W{Lp{W^), 0)). 

The theorem provides local error estimates in terms of the density H. Where H is 
small, i.e. S is dense, the approximation bound is better. The local nature of the error 
estimates is best captured in the case p = oo: 

Corollary 3.2 In the notations and assumptions of Theorem 3.1, the error bound in the 
case p = oo can be restated as follows: for every compact Q gW^ and every e > 0, there 
exists g G 5's(0), such that, for every x E Q, 

\f{x)-g{x)\ <e + CoH{xr\f\w^r.A^,),^y (3-6) 
The constant Cq is independent of f,g,e,fl and x. 
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Proof of Theorem 3.1. We begin by assuming that / G Cq(R") and later use a 
completion argument to derive the general case. We shall establish the estimate (3.5) 
for p = 1,00 and then derive the general case by interpolation. For any function g e 
Li(M'^) + Loo(M'^), we define 



L{g,x):= j H{xyg{t)E{x,t)dt. (3.7) 

Then L is a linear operator and we shall show that it maps Lp boundedly into itself for 
p — 1,00. Once this is established, the Marcinkicwicz interpolation theorem implies that 
L maps Lp(]R^) boundedly into itself for all 1 < p < 00. 

First consider the case p — 1. We invoke the estimate (2.8) and the definition of H 
given in (3.1) to find 

^ + h{i)) - + 

(3.8) 

Thus 

ll-^llLUR-i) = supieKd / H{x)-^\E{x, t) I dx 

<sup,eM<iCjrM^)~'(l + (3-9) 
= Cj{l + \y\)-^+^-^dy<C', 

where we used the fact that —u + rK < —d. 

For the case p = 00, we fix x e R*^ and define for each j G Z the set 

a- := HeR'^: 2^-1 < < 2n (3.10) 

H{x) 

Then, 

f H{x)-''\E{x,t)\dt = J2 [ H{x)-''\E{x,t)\dt^:J2^j- (3-11) 
We can estimate each of the integrals Ij appearing in the sum by using (2.8) to obtain 
/, < C2^-J [2^H{x)]-' (^1 + if^^ ' dt = C2^^J (1 + \y\)-''dy, (3.12) 

O O' 



n'. 



where Q'j = [2^H{x)] ^{x — flj). Since u > tk + d > d, it is clear that ^j<i Ij < C. For 
j > 1, we use the definition of H to find 

/ 2|a;-t|\'' / \x-t\Y hit) . , ^ 
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In other words, 

\x-t\ ^ 2(^-1)/'' - 1 

This means that 



> =:a,-, tenj. (3.14) 



Ij < C2^^ / + \y\)~''dy < C2^^a]''+'^ (3.15) 



\y\>'^. 



Since k < we have that Y.j>i '^■^''a^ is finite. This, together with (2.13), yields 

||L||i^(M.) = sup / H{x)-^\E{x,t)\ dt < C. (3.16) 

Consequently, we have proved that L boundedly maps Lp into itself for every 1 < p < oo. 

Now, if / e W{Lp{R'^), 0), then by the definition of this space, Tf e Lj,(M'^). Hence, 
from what we have already proved, 

\\HTf)\\Lp(Rd) < Co||T/||i^(Kd) = Co\f\w{Lp{Rd)^^^, (3.17) 

with Co = CC", with C an absolute constant, and C the constant that appears in A4. 
Assume next that / e Cq (IR'^) and define 

F:^ J Tf{t)K{;t)dt. (3.18) 

Prom A 1-2 and the fact that suppT/ is compact, we deduce that the sum that defines 
K{-,-) is finite on R*^ x suppT/. It follows then that 

F = J2 «(e)0(^ - 0, «(e) / A{t, oTfit) dt. (3.19) 

As said, the sum that defines F is actually finite. Thus, F e 5's(0). 
Continuing under the assumption that / e Cq(R.^), we have 



H{x)-^[f{x) - F{x)] = H{x)-^J Tf{t)E{x,t) dt = L{Tf){x). (3.20) 
It follows from what we have already proved that 

s{f,s^{4>)Up < 11/ - F|u^(^) < Co||r/|u^(M.) = Co\f\w(L,m,'t>)^ i <p < oo. (3.21) 

We now want next to extend (3.21) to all of W{Lp{W'-),(f)). Fix p E [l,C)o], and 
let / G W {Lp(M.'^) , (f)) . By the definition of W^(Lp(R°'), 0) there is a sequence of com- 
pactly supported functions fn, n = 1,2, . . ., from Cg such that /^j — > / in the norm of 
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W {Lp{R'^) , (f)) . Let Fn be defined by (3.18) for /„. We know that each of these F„ is in 
Se{4>)- For any compact set fl, we have w{x) > cn > 0, x & and therefore by writing 

Fm- Fn^Fm- fm- {Fn - fn) + {fm - fn) WC find 

\\Fm — Fn\\Lp{n) < Wfm — fn\\Lp{n) + CQ^Wfrn — fn — {Fm — Fn)\\Lp{w) 

^ ll/m — fn\\Lp{R'i) + C'n||^(/m " /n) || Lp(]Rd) ; (3.22) 

where the last inequahty uses (3.21) for f — fm ~ fn- This shows that (F„) is a Cauchy 
sequence in the topology of Lp-convcrgcnce on compact sets. By definition, its limit G is 
in Ss{4>)p- Again, for any compact set fl in M."^, we have 

\\w{f-G)\\Lpin) < lim Wfn-FnhpM < Co lim ||T(/„)|U^(r.) = Co||T(/)|U^(j,.), (3.23) 

with Co the constant of (3.21). Since Q is arbitrary, we find 

lk(/-G)|U^(M.)<Co||T(/)|U^(„.). (3.24) 

Since G e 5'e;(0)p, we can replace the left side of (3.24) by S{f, S'B.{(t>)p)Lp{w)- This com- 
pletes the proof of the theorem. ■ 



Theorem 3.1 deals with the approximation of functions that are optimally smooth, i.e., 
in the space W {Lp{W^) , (()) . In §5, we establish results concerning the approximation of 
functions that are less smooth. For such functions, the weight H^*^ is too strong. To this 
end, we state a counterpart of Theorem 3.1 for mollified versions of the original weight w. 
We still assume here that / is optimally smooth. This assumption will be dropped in §5. 

Suppose that Q < s < n. We continue to work under the assumptions Al-4. If 
/ e W{Lp{W^),(i)) then 

Wh'^-'TfWL^^^a^ (3.25) 
is finite because h is bounded (see Al (ii)). 

Theorem 3.3 If < s < k and f G W{Lp{W^), (f)) (for some I < p < oo), then for 
w :— we have 

S{f, ^=(0))l,m < Co\\h''-'Tf\\L^(^.y (3.26) 
with Cq as in Theorem 3.1. 

Proof: The proof is very similar to the proof of Theorem 3.1. We first remark that 
the two bounds 

sup [ H{x)-'\E{x,t)\h{t)'-''dx <C, sup [ H{x)-'\E{x,t)\h{t)'-'' dt < C, (3.27) 
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hold with C a constant depending only on d, s. Indeed, thanks to Assumption A4 it is 
sufficient to prove the above boundedness with the integrand replaced by 

H{xrh{tr-''h{tr-^ {i + ^) ^ = H{xrh{tr-' {i + ^ 

Since we assume v > kt -\- d > sr -\- the argument as given in the proof of Theorem 3.1 
applies here verbatim to yield (3.27) . 

The bounds in (3.27) now imply that the linear operator 



L{g) - H{x)-^ j g{t)E{x,t)h^-^{t) 



dt 



is bounded on Lp(]R") for p — 1, oo. By interpolation, we derive that this operator is 
bounded on LpiW^) for all 1 < p < oo. Using this for g — h'^~^T{f), we derive (3.26) for 
all / e W{(j), Lp(R.'^)) in the same way we have proven Theorem 3.1. ■ 



4 Wavelet decompositions 

In the remaining sections of this paper, we shall be in need of a local multiscale basis on 
R*^. We shall employ a standard multivariate wavelet basis for this purpose. This basis 
will be used only as a tool for proving various results. In this section, we recall the form 
of such a basis and some of its properties which will be important to us. In particular, 
we shall need its characterization of Triebel-Lizorkin spaces. There are several books that 
discuss wavelet decompositions and their characterization of these spaces (see e.g. [22]). 
We also refer to the article of Daubechies [4] for the construction of wavelet bases of the 
type we want to use. 

Let T> denote the set of dyadic cubes in R'^ and T>j the set of dyadic cubes of side 
length 2-J' (thus V = U^^^Vj). Each I e Vj is of the form 

/ = 2-^[h, fci + 1] X • • • X 2-^[kd, kd + l]= 2-\k + [0, l]'^), k = {ki, ...,kd)e 7.". (4.1) 

For each such /, we denote its side length by £(/): 

£(/) := 2-\ V7 e Vj. 

Finally, let 

£; := {1,...,2°'- 1}, V:=Px£;. 
Given v — {ly, Cy) {ly e P, e„ e £■), we denote 
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and by 

\v\ -.^eivf 

the volume of the cube ly. 

A wavelet basis is an orthonormal basis for L2 (M*^) with particular structure and prop- 
erties. The wavelets are indexed by the set V: 

Wy, V e V. 

Each wavelet w^, with 1^ — 2^{k + [0, 1]*^) e T>, is supported in a cube with 

with ^0 some fixed constant that depends only on the specifics of the wavelet system we 
choose. 

We normalized initially the wavelet system in L2{W') . The Lp-norm of the p-normalized 
wavelets, 1 < p < 00: 

'tpv,p \v\^~^Wy (4.2) 



depends only on their type, i.e., on the index e E. Each locally integrable function / 
defined on W'' has a wavelet decomposition 

Here /„ depends on the p-normalization that has been chosen but the product fv'4^v,p is 
independent of p. In this paper, it will be convenient to normalize the wavelets in L^: 

The series (4.3) converges absolutely to / in the Lp-norm in the case / G Lp{W^) and 
1 < j9 < 00, with HiiW^) replacing Li(]R''), and conditionally in the case p = 00 with 
L^{W^) replaced by C{W^). 

One of the most important properties of wavelet systems (and the one we need in this 
paper) is the characterization of smoothness spaces in terms of the wavelet decomposition. 
This means that we can use the wavelet decomposition in order to define those smoothness 
spaces. For the definition of Triebel-Lizorkin spaces F^^ in terms of wavelet coefficients, 
we fix s > 0. Then, for < p, g < 00, 

|/|^.^:=||M,(/)|U^(M.), (4.4) 

where ^ 

M,{f){x) M,,,(/)(x) := [Y.i{v)-'^Vv,AiM] ■ (4.5) 
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The definition does not depend on the wavelet system we choose, provided that the 
wavelets are m-times difi^erentiable, and have m vanishing moments (viz., their Fourier 
transform has a m-fold zero at the origin), for sufficiently large m.^ 

We should also make some specific remarks about our definition. In our definition we 
have defined the maximal function with Xi^ where the support cube of the ip^,. The 
usual definition of ^ uses Xh instead of Xi^- It is easy to see that these two definitions 
give equivalent norms by using the Feffer man- Stein inequality mentioned in the proof 
of Lemma 6.1. We also remark that Triebel-Lizorkin spaces are usually defined using 
Littlewood-Paley decompositions. Our definition agrees with the classical definition if 
(the wavelet system is chosen appropriately, cf. the above footnote) and the space F^^^ 
continuously embeds into Li. 

The definition extends naturally to the q = oo case, with M^o^f) defined by 

M^(/)(x) :=M,,^(/)(x) := sup^(^;)-^|/,,oo|xj.(^)- 

While we followed so far the tradition of of assuming p < oo, we will use the space -F^.oc 
whose semi-norm is 

\f\pSo,oo '■= sup ^(v)-' 1/^,00 1X7. (2;), 

and which usually appears in the literature as the Besov space B^^^. Notice that B^^^ 
is compactly embedded in C(M). 

The quasi-norm in Fp g{W^) is defined by 

||/||F^^,^(Md) ll/l|Lp(Md) + Iflri^iRd). (4.6) 

5 Approximation of functions with lower smoothness 

The estimate (3.26) is unsatisfactory because it can be apphed only to a small subset of 
functions in Lp. In this section, we shall remove this deficiency. Our method for doing this 
is an 'interpolation of operators' type argument which decomposes a general function into 
a smooth part to which (3.26) can be applied and a second nonsmooth part which is small. 
We shall restrict our discussion to the case where the operator T is an elliptic differentiable 
operator of order k with constant coefficients and is its fundamental solution (on M*^). 
For such T, we can choose the wavelet system in the previous section to satisfy 

m)(x) < C'i{v)-^XiAx), veV. (5.1) 

^Thc basic requirement is that m > s. Additional requirements are imposed in case p < 1 ot q < 1. 
For us, the only thing that matters is the existence of some wavelet system that can be used to define the 
Triebel-Lizorkin space Fp ^. In particular, we can, and do, allow the wavelet system to depend on s,p, q. 
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We fix 1 < p < CX3 and continue to work under assumptions Al-4. We recall in the 
present case W^(Lp(M'^), cj))) is the Sobolev space W^''(Lp(M'^)). Then, each / e Lp{W^) has 
a series representation 

/ = ^Mv, 

with the sum convergent in I/p(M'^). We shall work in this section exclusively with Loo- 
normalized wavelets ip^. 

Now, let / be a function in the Triebel-Lizorkin space F^^, s < k,, and let h be the 
density function for S. We decompose / = + in the following way: 

fh 4'vfv, fh '■— f ~ fh^ 

e(v)>h(v) 

where 

Hv) := 

We shall first estimate how well can be approximated by elements from Ss{4>)- 
Lemma 5.1 Let 1 < p < oo and < s < k. If f E F^^^, then for w :— H"^ , we have 

dist(/+,5H(0))L,H < (5.2) 
with a constant C as in Theorem 3.1. 
Proof: ^Prom (5.1), we obtain for any x e M*^, 

\h''~^{x)T{f^){x)\ < h^-^{x) \fv\mv){x)\ 

i{v)>h{v) 
e{v)>h{v) 

i{v)>h{x) 

< CMs,oo{f){x)h{xY-'h{x)-''+' <CM,,^{f){x). (5.3) 

Here we have used the fact that there is an absolute constant Ci depending only on the 
support size of the wavelet such that for any dyadic level j, there are at most Ci / e Vj 
which contain the given point x. This means that the above series can be compared with 
a geometric series and can be bounded by a fixed multiple of its largest term. Thus, the 
constant C depends only s and d. Therefore, by Theorem 3.3, 

dist(/+, 5s(0))l,(^-») < C \\h''-'T{f^)\l^^^a) < C ||M,,oo(/)||^^ = C 11/11^.^ , 
which completes the proof of the lemma. ■ 
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Remcirk. Formally, the proof given in the lemma does not cover the case p — 1. 
The reason is that the wavelet representation of / e Li(]R'') does not always converge 
to /. However, the lemma does extend to all / e Li(]R'^), provided that we use an 
inhomogeneous wavelet representation. Such representation takes the form 

Yl {f,^v)A+ {f^^-)^v=- fl + f2, 

e(v)<2J i(v)=2-' 

with the modified wavelets ip^ supported in exactly the same cube 1^ as their original 
wavelet counterparts, and satisfy Ti^ipy) < C'2~'^^Xh- The integer J can be chosen at 
will. This modified expansion converges for every / e Li(R''). We can use the above 
inhomogeneous wavelet expansion in the proof of the Lemma, since we know (see the 
discussion on surface splines in §2) that the density function is hounded, which implies 
that the term in the decomposition of / contains the entire expansion of the above /2 
(for a suitable large J that depends on the bound we have on h, but on nothing else.) The 
argument in the proof of the lemma can be then repeated verbatim for the case p — 1. 
However, the smoothness space that is characterized by the inhomogeneous expansion is 
the inhomogeneous Triebel-Lizorkin space. For this reason, we have stated the result with 
respect to the full norm ||/||f«^- For 1 < p < oo, the result is also valid with H/Hf"^ 
replaced by \f\Fi^- 

We are left with bounding the Lp(if~'^)-norm of f^. 

Lemma 5.2 Fix 1 < p < oo. If < s < k. and w := H~^, then 

\\fHKM<C{s,d)\\fys^^, (5.4) 

where C (s, d) depends only on s and d. 

Proof: Since = ^^(^)</i(^) fv'ipvix) and |V',;(a;)| < C'xi^{x) for an absolute constant 
C", we have 

\f^{x)\<CM,,Mi^) E ^(^yXiS^) (5.5) 

e{v)<h{v) 

Given an x e R'^ for which x E ly and £{v) < h{v), we know that there is a i e 7^ such 
that h{v) — h{t). For this t, we have 

H{x)>h{v){l+^-^r. (5.6) 

Since, \x - t\ < Vde{Q < \/dAoe{v) < VdAoh{v), we find that h{v) < CH{x) with C 
depending only on r. Thus, 

E WXiM< Yl Kvr<C'H{xy, (5.7) 

e{v}<h{v) Iy3x,i{v)<CH{x) 
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where C depends on s and d. Here, as in the previous lemma we compared the series 
in (5.7) with a geometric series and bounded it by a fixed multiple of its largest term. 
^Prom (5.7), we obtain 

which proves the lemma. ■ 



Combining the two lemmas, we arrive at the following result. 

Theorem 5.3 Assume that and S satisfy the assumptions Al-4 with respect to a ho- 
mogeneous differential operator T with constant coefficients (and degree k). For every 
1 < p < OO; for every < s < k, and for every f e F^^oo have 

dist(/, 5=(0))l,(//-^) < C{s, d) 11/11^^.^ , (5.8) 

where C as in Theorem 3.1, and H is the majorant density function associated with S. 

We can also derive results for approximation in Sobolev spaces, as well as in Besov 
spaces. For the Sobolev space Wp, 1 < p < oo, < s < k, we have that — 2 C F^^^, 
[31], and hence Theorem 5.3 implies that 

dist(/, 5=(0))^^(^-.) < C ||/||^.(«.) . (5.9) 

As to Besov spaces, since Bp{Lp{W'')) is continuously embedded into Fp^^ (see [31]), we 
have, by the same theorem, that, for 1 < p < 00 and < s < k, 

dist(/, s^mL^iH-s) < c ii/iIb.(^^(k.)) . (5.10) 

This latter statement extends trivially to p = cx), since F^ ^ = B^{Loo). 

Note that we have restricted out attention to a differential operator T. The reason 
is that our analysis depends on two properties of the wavelet system. The first is the 
characterization of Triebel-Lizorkin spaces in terms of wavelet decompositions, and the 
second is (5.1). In order to extend the result of this section to more general T, we will need 
a representation system that will satisfy, first and foremost, a bound analogous to (5.1) 
with respect to the more general T. We can then define using that system smoothness 
spaces that are analogous to the Triebel-Lizorkin spaces from §4, and estabhsh an analog 
of Theorem 5.3. 
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6 Nonlinear approximation 



We shall now turn to a different setting. We assume / G Lp(]R^) is a function that we wish 
to approximate using the shifts of 0. In contrast to the problems studied so far where 
the set of centers is prescribed in advance, we shall now allow the choice of the centers to 
be made dependent on /. We are interested in how well we can approximate / using at 
most such centers. 

Let Sjv := Sjv(0) be the set of all functions S for which there is a set S of cardinality 
N such that 

S = J2<^^<P{--0- (6.1) 
We then define the approximation error 

aj,{f), := inf ||/ - S^^^^ay (6.2) 

This form of nonlinear approximation is known as N-term approximation. 

Our setting is different than that considered in previous sections. We begin with a 
function (f) that satisfies assumption A2. For simplicity, we also suppose that the operator 
T is a homogeneous differential operator of order k. The reader can easily abstract the 
conditions we use about T to get a more general theorem. 

We do not assume Al, A3, A4 since S is not specified in advance. Rather, we 
shall assume that for any dyadic cube there is a collection of points near this cube that 
locally satisfy A3-4. To make this assumption precise, we fix the order k, of the differential 
operator T and fix a wavelet system {ipv}vev K-times differentiable compactly supported 
wavelets of the form described in §4. As in that section, for each v, we denote by 1^ the 
smallest cube which contains the support of ■0^. Recall that we know that 1^ has size 
comparable to that of 7^,, namely < Aoi{v) for a fixed constant Aq that depends 

only on k. 

We make the following assumption (about 0) in this section. 

A5: Given v > d, there is an absolute constant Co and an integer Nq such that for 
any N > Nq, and any v & V, there is a set jv C R*^ consisting of N points with the 
following property: There is a linear combination 

with Ay^isi{-, ^) in Loo{Iv) such that (f){x — t) — K^^j^{x, t) satisfies 

\(t>{x -t)- K,,N{x,t)\ < C^h^-j^ (l + ' teh, xe (6.3) 

V riv,N / 
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with 

The new assumption A5 can be easily shown to be satisfied for surface splines (Ex- 
ample 2 in §2). Indeed, given N, we choose m such that < N, and choose S^^at to 
be the vertices of a uniform grid on /„ with mesh size /i^^jv i{Iv)/{m — 1). Assuming 
that m > K — d + u (see Example 2), it is then easy to follow the dilation argument given 
in Example 2 to conclude that the linear functionals Xf, t & ly are uniformly bounded, 
independently of 1^ and m (hence of N). Also, h^^N as above satisfies hy^N < Ch^^N- 
Thus, (6.3) follows from the vahdity of A4 for surface splines (see Example 2 in §2). 
The constant C depends only on d and the number Aq, hence is universal for all of our 
purposes. Thus, A^o in this case is {k — d+ vY. 

To prove theorems about A'"-term approximation of a function / by shifts of the func- 
tion 0, we will first represent / in a wavelet decomposition and approximate the individ- 
ual terms in this decomposition. We begin by seeing how well we can approximate the 
wavelet (normalized in Lqo), by using a budget of iVy centers. Given and any integer 

> No, we let S„^jv^ be the set of points satisfying A5. Let us denote 

Sy,M.:= jT{^y){t)K{;t)dt, Kix,t):= A{t,0<i>ix-0- (6-4) 

Notice that since T{il)y) = 0, t ^ ly, the integral in (6.4) only goes over t & ly and 
therefore by A5, this integral is finite. 

Now, an important property of the wavelets that we need here is that ipy satisfies 
IIT^^IIloo < Coi{v)-^. Also, thank to A2, ipv = J T{ipy){t)(t){- - t) dt. We use this 

together with A5 to derive the following bound 

Iv 

< CN-"{l + '^^y"\ (6.5) 

Let us describe now our approximation algorithm and analyze its performance. In the 
algorithm, we are given a budget N of centers, and invest a nominal amount > in 
each wavelet ■0^, e V; we refer to Cy as cost. We ensure that the total cost Yliy does 
not exceed the given budget N. Since Cy may not be an integer, and since the minimal 
number of centers that we can use is A^o, the cost Cy allows us to approximate the term 
fyipy in the wavelet expansion (4.3) of / by investing 

Ny := lcy\ (6.6) 
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centers, provided c„ > A^o- In this case, (6.5) gives us the estimate 

- S.,N. ix) I < CN;^/' (i + ^^^^1^) < C'c-^/' (i + . 

(6.7) 

If Cy < Nq, we do not approximate ipy at all, and get then 

- S,,nM\ = < CxiM, X e (6.8) 

since the wavelets are uniformly bounded. 

Now, suppose that we are given a budget of N centers, determine a cost distribution 
{cv)v (with X]„c„ < N), and would like to estimate the Lp-norm of the error when 
approximating the term fyipy in the wavelet expansion of / using the designated cost c^. 
According to (6.7) and (6.8), our error will be determined, up to a universal constant, by 
the p-norm of 

V V 

Here, (/^) are the wavelet coefficients of the approximand /. 

Lemma 6.1 Let 1 < p < oo and suppose that the constant u appearing in A5 is > 2d. 
If Rv is defined as in (6.9), then 





< c 


5^[max(l,c„)]-'^/'^|/„|X7. 




(6.10) 






vev 







Proof: This inequality can be derived using the method of proof for Fefferman-Stein 
inequalities [16]. In fact, for 1 < p < oo, it can be derived directly from these inequalities 
as follows. Let Mq be the Hardy-Littlewood maximal operator 

Mo{g){x) :^ sup \g{u)\du (6.11) 

Q3X \Q\ J 

Q 

where the supremum is taken over all cubes Q that contain x. Then, for 1 < p < oo and 
any real numbers (a„), the Fefferman-Stein inequality says 







< c 




1 


(6.12) 




Lp(]R 




vev 







where C depends only on p as p gets close to 1 and oo. Now a direct calculation shows 
that for a constant Co depending only on d we have 

M„to.)(.) > Co (l + ^?!f|i«) " > C; (l + '^^y (6.13) 
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where, in the last inequahty, we used our assumption that u > 2d and the fact that 
1 + ^^^j^ < C{1 + ^^^j^) for all X eR'^ with a constant C depending only on the 
space dimension d. It follows from this and (6.5), (6.8) that 

R,{x) < C[max(l,c,)]-'^/'^Mo(x/J(x), x e R", 

with C again depending only on d. Using this with (6.12) we derive the lemma for 
1 < p < oo. 

One can derive the lemma in the case p = 1 and also obtain a constant not depending 
on p as p ^ 1 by using a modified Hardy-Littlewood maximal function 

Ki9)i^) - sup I 1^ / \9{u)\''du 

with II < 1. The Fefferman-Stein inequality now holds for p = 1 if this new maximal 
function is used in place of Mq. For Mq one has an analogue of (6.13) where in the first 
inequality the exponent d is replaced by df/i. Thus if /i is sufficiently close to one so that 
^{v — d) > d, we again arrive at the lemma for p = 1. When p — oo, one can again derive 
the lemma using the fact that u > 2dhy an analogous argument to the proof of (6.12). ■ 



The following is the main result of this section. 

Theorem 6.2 Let 1 < p < oo be given, and let f e F^^^, with s < k, t — (l/p + s/d)~^, 
and q :— {1 + s / d)~^ . Then 

C7N{f)p<CN-^/'\\f\\^^^^. 

The constant C here is independent of f and N . 

In the proof of the theorem, we will use the following elementary observation. 

Lemma 6.3 Let Xljl-oc -^i non-negative series with limit Z < oo, and with partial 
sum sequence :— Yl'j=-oo ^^''^ ^''^V ^ > 0; there is aC^> depending only on e such 
that 

oo 

j = -00 3 



Proof of the lemma: For each positive integer let jk be the minimal integer for 
ch % > 2-^Z, jo := oo. Then 

ife-i-i 

V < 2Mi-^)^^-i2-(*^-i)z = 2Z^2-^^ 

rv\-e — 



3=3h 3 
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Summing over all positive /c, we obtain the stated result. 



Proof of the theorem: Fix / e F^^^ with / not the zero function, and fix a positive 
integer N. For any given x e R*^, we consider the set Vx of allv eV such that Xivi^) 0- 
We can order the e as follows. We take any fixed ordering for E and then we say 
V > v' ii either \v\ > \v'\ or \v\ — \v'\ and > e„/. Given this order, we now define for 
each X e R*^ and each v' e V, 



E 



1/9 



veVx'-v>v' 

Notice that Mqy{x) is actually constant on 7^/ and so we denote 

Mgy := Mgy(a;), x e I^^. 

Also, we clearly have 

Mgy < Mg(/)(x), xel^'. 
We determine our cost function by the rule 



(6.14) 
(6.15) 



where a will be specified in a moment. Now r — q >0 and q — 1 — qs/d. Therefore, from 
(6.15), we see that 



Thus, 



< 



a\\M,{fy\\^^ 



(6.16) 



where we have used the fact that Xh ^ XJ„ v Thus, we can choose a so that 

^ II/IIf^ — obtain that Ylv — ^■ 

It remains to estimate the Lp-error produced by the scheme. In view of Lemma 6.1, 
we need to estimate the L^-norm of 

^[max(l,c,)]-«/'^|Mx/„ < 5^[max(l,c,)]-/'^|/,|x/. < 5^c-/1/.|X/„ 



Here we have used the fact that s < k. 



vev 
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For X E ly, we have 

Here, we have used the fact that 1 — qs/d = q. If p = 1, then r/p — q = 0, and, fixing x, 
we obtain 

J^Evix) < a-^/''Ml{f){x) = a-^P/''Ml{f){x). 
vev 

We can prove a similar estimate when p > 1. Namely, we fix x e R'^ and invoke Lemma 
6.3 with Zy :— \v\~'^^^^\fv\'^Xiv{x) using the ordering on introduced earlier. Hence, 
M9„ = Zy and Mg,{,^~* = Z^'^, with e = T/{pq) > 0. Also, Z < M^{f){x) again because 
Xiv < Xh- By the lemma, 

J2 \v\-''^Vv\'MliP-^ < C(r,p,q)M,ify/P. 
xeTv 

Thus, 

{J2Er,(x)r <Ca-^^/'M,(fy(x), 
and we conclude that, with A := \\f\\ps 

ip(IR'') 



V 



We can derive from the theorem a corresponding result for the Besov space B^{Lt) 
(see any of the standard texts for a definition of these spaces.) This Besov space is 
continuously embedded in F^^. Hence we obtain 

Corollary 6.4 Let 1 < p < oo be given, and let f e B^{Lt-), with s < k, t — + 
s/d)~^ , and g := (1 + s/d)~^ . Then 

The constant C here is independent of f and N . 

Finally, we compare this theorem with the classical results on A^-term wavelet approx- 
imation given in [9]. For wavelet approximation one obtains the same bounds with the 
assumption / G i?*(L^) = F^^^. Since q < t, the wavelet assumption is (slightly) weaker 
than what is assumed in Theorem 6.2. The two assumptions agree when p — 1. We do 
not know if q can be replaced by r for other values of p. We believe that it cannot, i.e., 
that the value of q in our theorems is the best possible one. 
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